Search CORE

10 research outputs found

CrypTen: Secure Multi-Party Computation Meets Machine Learning

Author: Hannun Awni
Ibrahim Mark
Knott Brian
Sengupta Shubho
van der Maaten Laurens
Venkataraman Shobha
Publication venue
Publication date: 15/09/2022
Field of study

Secure multi-party computation (MPC) allows parties to perform computations on data while keeping that data private. This capability has great potential for machine-learning applications: it facilitates training of machine-learning models on private data sets owned by different parties, evaluation of one party's private model using another party's private data, etc. Although a range of studies implement machine-learning models via secure MPC, such implementations are not yet mainstream. Adoption of secure MPC is hampered by the absence of flexible software frameworks that "speak the language" of machine-learning researchers and engineers. To foster adoption of secure MPC in machine learning, we present CrypTen: a software framework that exposes popular secure MPC primitives via abstractions that are common in modern machine-learning frameworks, such as tensor computations, automatic differentiation, and modular neural networks. This paper describes the design of CrypTen and measure its performance on state-of-the-art models for text classification, speech recognition, and image classification. Our benchmarks show that CrypTen's GPU support and high-performance communication between (an arbitrary number of) parties allows it to perform efficient private evaluation of modern machine-learning models under a semi-honest threat model. For example, two parties using CrypTen can securely predict phonemes in speech recordings using Wav2Letter faster than real-time. We hope that CrypTen will spur adoption of secure MPC in the machine-learning community

arXiv.org e-Print Archive

Large Language Models for Software Engineering: Survey and Open Problems

Author: Fan Angela
Gokkaya Beliz
Harman Mark
Lyubarskiy Mitya
Sengupta Shubho
Yoo Shin
Zhang Jie M.
Publication venue
Publication date: 11/11/2023
Field of study

This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations. Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE

arXiv.org e-Print Archive

Private Matching for Compute

Author: Andrew Knox
Erik Taubeneck
Payman Mohassel
Prasad Buddhavarapu
Shubho Sengupta
Vlad Vlaskin
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 22/05/2020
Field of study

We revisit the problem of two-party private set intersection for aggregate computation which we refer to as private matching for compute. In this problem, two parties want to perform various downstream computation on the intersection of their two datasets according to a previously agreed-upon identifier. We observe that prior solutions to this problem have important limitations. For example, any change or update to the records in either party\u27s dataset triggers a rerun of the private matching component; and it is not clear how to support a streaming arrival of one party\u27s set in small batches without revealing the match rate for each individual batch. We introduce two new formulations of the private matching for compute problem meeting these requirements, called private-ID and streaming private secret shared set intersection, and design new DDH-based constructions for both. Our implementation shows that when taking advantage of the inherent parallelizability of these solutions, we can execute the matching for datasets of size upto 100 million records within an hour

Cryptology ePrint Archive

Delegated Private Matching for Compute

Author: Benjamin Case
Daniel Masny
Dimitris Mouris
Ni Trieu
Prasad Buddhavarapu
Shubho Sengupta
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 14/11/2023
Field of study

Private matching for compute (PMC) establishes a match between two datasets owned by mutually distrusted parties (

C

and

P

) and allows the parties to input more data for the matched records for arbitrary downstream secure computation without rerunning the private matching component. The state-of-the-art PMC protocols only support two parties and assume that both parties can participate in computationally intensive secure computation. We observe that such operational overhead limits the adoption of these protocols to solely powerful entities as small data owners or devices with minimal computing power will not be able to participate. We introduce two protocols to delegate PMC from party

P

to untrusted cloud servers, called delegates, allowing multiple smaller

P

parties to provide inputs containing identifiers and associated values. Our Delegated Private Matching for Compute protocols, called DPMC and D

_s

PMC, establish a join between the datasets of party

C

and multiple delegators

P

based on multiple identifiers and compute secret shares of associated values for the identifiers that the parties have in common. We introduce a rerandomizable encrypted oblivious pseudorandom function (OPRF) primitive, called EO, which allows two parties to encrypt, mask, and shuffle their data. Note that EO may be of independent interest. Our D

_s

PMC protocol limits the leakages of DPMC by combining our EO scheme and secure three-party shuffling. Finally, our implementation demonstrates the efficiency of our constructions by outperforming related works by approximately

10\times

for the total protocol execution and by at least

20\times

for the computation on the delegators

Cryptology ePrint Archive

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets

Author: Dubey Pradeep
Hassaan M. Amber
Park Jongsoo
Patwary Md. Mostofa Ali
Satish Nadathur
Sengupta Shubho
Seo Jiwon
Sundaram Narayanan
Yin Zhaoming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/06/2014
Field of study

Graph algorithms are becoming increasingly important for analyzing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed towards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among others) have been developed, each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap" between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users' choiceof graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for improving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get performance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps endusers delineate bottlenecks arising from the algorithms themselves vs. programming model abstractions vs. the framework implementations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific algorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use

Crossref

ScholarWorks@UNIST